Understanding the Flow of Content in Summarizing HTML Documents

نویسندگان

A. F. R. Rahman

H. Alam

چکیده

In recent times, the way people access information from the web has undergone a transformation. The demand for information to be accessible from anywhere, anytime, has resulted in the introduction of Personal Digital Assistants (PDAs) and cellular phones that are able to browse the web and can be used to find information using wireless connections. However, the small display form factor of these portable devices greatly diminishes the rate at which these sites can be browsed. This shows the requirement of efficient algorithms to extract the content of web pages and build a faithful reproduction of the original pages with the important content intact.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Algorithm for Mining on Bio Medical Data for Ranking the Web Pages

Information in the internet is evolving in terms of high volume through different sources. Extracting tuples from HTML pages has been an important issue in various web applications such as web data integration, e-commerce market monitoring, and mash ups that repurpose and selectively combine existing web data services. Data Mining is the process of analyzing data from different perspectives and...

متن کامل

Efficient Algorithm for Mining on Bio Medical Data for Ranking the Web Pages

متن کامل

Detecting Tables in HTML Documents

Table is a commonly used presentation scheme, especially for describing relational information. Table understanding on the web has many potential applications including web mining, knowledge management, and web content summarization and delivery to narrow-bandwidth devices. Although in HTML documents tables are generally marked as elements, often the tag is used liberally to ach...

متن کامل

MIME Encapsulation of Aggregate Documents, such as HTML (MHTML)

HTML [RFC 1866] defines a powerful means of specifying multimedia documents. These multimedia documents consist of a text/html root resource (object)and other subsidiary resources (image, video clip, applet, etc. objects) referenced by Uniform Resource Identifiers (URIs) within the text/html root resource. When an HTML multimedia document is retrieved by a browser, each of these component resou...

متن کامل

Automatic Annotation of Content-Rich HTML Documents: Structural and Semantic Analysis

Although RDF/XML has been widely recognized as the standard vehicle for representing semantic information on the Web, an enormous amount of semantic data is still being encoded in HTML documents that are designed primarily for human consumption and not directly amenable to machine processing. This paper seeks to bridge this semantic gap by addressing the fundamental problem of automatically ann...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Understanding the Flow of Content in Summarizing HTML Documents

نویسندگان

چکیده

منابع مشابه

Efficient Algorithm for Mining on Bio Medical Data for Ranking the Web Pages

Efficient Algorithm for Mining on Bio Medical Data for Ranking the Web Pages

Detecting Tables in HTML Documents

MIME Encapsulation of Aggregate Documents, such as HTML (MHTML)

Automatic Annotation of Content-Rich HTML Documents: Structural and Semantic Analysis

عنوان ژورنال:

اشتراک گذاری